Non-linear spectral contrast stretching for in-car speech recognition
نویسندگان
چکیده
In this paper, we present a novel feature normalization method in the log-scaled spectral domain for improving the noise robustness of speech recognition front-ends. In the proposed scheme, a non-linear contrast stretching is added to the outputs of log mel-filterbanks (MFB) to imitate the adaptation of the auditory system under adverse conditions. This is followed by a two-dimensional filter to smooth out the processing artifacts. The proposed MFCC front-ends perform remarkably well on CENSREC-2 in-car database with an average relative improvement of 29.3% compared to baseline MFCC system. It is also confirmed that the proposed processing in log MFB domain can be integrated with conventional cepstral post-processing techniques to yield further improvements. The proposed algorithm is simple and requires only a small extra computation load.
منابع مشابه
Mon.O2a.05 Sub-band Based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition
Log energy and its delta parameters, typically derived from full-band spectrum, are commonly used in automatic speech recognition (ASR) systems. In this paper, we address the problem of estimating log energy in the presence of background noise (usually resulting in a reduction in dynamic ranges of spectral energies). We theoretically show that the background noise affects the trajectories of th...
متن کاملSub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition
Log energy and its delta parameters, typically derived from full-band spectrum, are commonly used in automatic speech recognition (ASR) systems. In this paper, we address the problem of estimating log energy in the presence of background noise (usually resulting in a reduction in dynamic ranges of spectral energies). We theoretically show that the background noise affects the trajectories of th...
متن کاملCorrelation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants
Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007